Goto

Collaborating Authors

 ai text detector


Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text

Cheng, Yize, Sadasivan, Vinu Sankar, Saberi, Mehrdad, Saha, Shoumik, Feizi, Soheil

arXiv.org Artificial Intelligence

The increasing capabilities of Large Language Models (LLMs) have raised concerns about their misuse in AI-generated plagiarism and social engineering. While various AI-generated text detectors have been proposed to mitigate these risks, many remain vulnerable to simple evasion techniques such as paraphrasing. However, recent detectors have shown greater robustness against such basic attacks. In this work, we introduce Adversarial Paraphrasing, a training-free attack framework that universally humanizes any AI-generated text to evade detection more effectively. Our approach leverages an off-the-shelf instruction-following LLM to paraphrase AI-generated content under the guidance of an AI text detector, producing adversarial examples that are specifically optimized to bypass detection. Extensive experiments show that our attack is both broadly effective and highly transferable across several detection systems. For instance, compared to simple paraphrasing attack--which, ironically, increases the true positive at 1% false positive (T@1%F) by 8.57% on RADAR and 15.03% on Fast-DetectGPT--adversarial paraphrasing, guided by OpenAI-RoBERTa-Large, reduces T@1%F by 64.49% on RADAR and a striking 98.96% on Fast-DetectGPT. Across a diverse set of detectors--including neural network-based, watermark-based, and zero-shot approaches--our attack achieves an average T@1%F reduction of 87.88% under the guidance of OpenAI-RoBERTa-Large. We also analyze the tradeoff between text quality and attack success to find that our method can significantly reduce detection rates, with mostly a slight degradation in text quality. Our adversarial setup highlights the need for more robust and resilient detection strategies in the light of increasingly sophisticated evasion techniques.


AuthorMist: Evading AI Text Detectors with Reinforcement Learning

David, Isaac, Gervais, Arthur

arXiv.org Artificial Intelligence

In the age of powerful AI-generated text, automatic detectors have emerged to identify machine-written content. This poses a threat to author privacy and freedom, as text authored with AI assistance may be unfairly flagged. We propose AuthorMist, a novel reinforcement learning-based system to transform AI-generated text into human-like writing. AuthorMist leverages a 3-billion-parameter language model as a backbone, fine-tuned with Group Relative Policy Optimization (GPRO) to paraphrase text in a way that evades AI detectors. Our framework establishes a generic approach where external detector APIs (GPTZero, WinstonAI, Originality.ai, etc.) serve as reward functions within the reinforcement learning loop, enabling the model to systematically learn outputs that these detectors are less likely to classify as AI-generated. This API-as-reward methodology can be applied broadly to optimize text against any detector with an accessible interface. Experiments on multiple datasets and detectors demonstrate that AuthorMist effectively reduces the detectability of AI-generated text while preserving the original meaning. Our evaluation shows attack success rates ranging from 78.6% to 96.2% against individual detectors, significantly outperforming baseline paraphrasing methods. AuthorMist maintains high semantic similarity (above 0.94) with the original text while successfully evading detection. These results highlight limitations in current AI text detection technologies and raise questions about the sustainability of the detection-evasion arms race.


DUPE: Detection Undermining via Prompt Engineering for Deepfake Text

Weichert, James, Dimobi, Chinecherem

arXiv.org Artificial Intelligence

As large language models (LLMs) become increasingly commonplace, concern about distinguishing between human and AI text increases as well. The growing power of these models is of particular concern to teachers, who may worry that students will use LLMs to write school assignments. Facing a technology with which they are unfamiliar, teachers may turn to publicly-available AI text detectors. Yet the accuracy of many of these detectors has not been thoroughly verified, posing potential harm to students who are falsely accused of academic dishonesty. In this paper, we evaluate three different AI text detectors--Kirchenbauer et al. watermarks, ZeroGPT, and GPTZero--against human and AI-generated essays. We find that watermarking results in a high false positive rate, and that ZeroGPT has both high false positive and false negative rates. Further, we are able to significantly increase the false negative rate of all detectors by using ChatGPT 3.5 to paraphrase the original AI-generated texts, thereby effectively bypassing the detectors.


GenAI Detection Tools, Adversarial Techniques and Implications for Inclusivity in Higher Education

Perkins, Mike, Roe, Jasper, Vu, Binh H., Postma, Darius, Hickerson, Don, McGaughran, James, Khuat, Huy Q.

arXiv.org Artificial Intelligence

This study investigates the efficacy of six major Generative AI (GenAI) text detectors when confronted with machine-generated content that has been modified using techniques designed to evade detection by these tools (n=805). The results demonstrate that the detectors' already low accuracy rates (39.5%) show major reductions in accuracy (17.4%) when faced with manipulated content, with some techniques proving more effective than others in evading detection. The accuracy limitations and the potential for false accusations demonstrate that these tools cannot currently be recommended for determining whether violations of academic integrity have occurred, underscoring the challenges educators face in maintaining inclusive and fair assessment practices. However, they may have a role in supporting student learning and maintaining academic integrity when used in a non-punitive manner. These results underscore the need for a combined approach to addressing the challenges posed by GenAI in academia to promote the responsible and equitable use of these emerging technologies. The study concludes that the current limitations of AI text detectors require a critical approach for any possible implementation in HE and highlight possible alternatives to AI assessment strategies.


Most sites claiming to catch AI-written text fail spectacularly • TechCrunch

#artificialintelligence

As the fervor around generative AI grows, critics have called on the creators of the tech to take steps to mitigate its potentially harmful effects. In particular, text-generating AI in particular has gotten a lot of attention -- and with good reason. Students could use it to plagiarize, content farms could use it to spam and bad actors could use it to spread misinformation. OpenAI bowed to pressure several weeks ago, releasing a classifier tool that attempts to distinguish between human-written and synthetic text. But it's not particularly accurate; OpenAI estimates that it misses 74% of AI-generated text. In the absence of a reliable way to spot text originating from an AI, a cottage industry of detector services has sprung up.